Importing the required libraries¶

Reading data present in the CSV file¶

(583, 11)
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 583 entries, 0 to 582
Data columns (total 11 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Age                         583 non-null    int64  
 1   Gender                      583 non-null    object 
 2   Total_Bilirubin             583 non-null    float64
 3   Direct_Bilirubin            583 non-null    float64
 4   Alkaline_Phosphotase        583 non-null    int64  
 5   Alamine_Aminotransferase    583 non-null    int64  
 6   Aspartate_Aminotransferase  583 non-null    int64  
 7   Total_Protiens              583 non-null    float64
 8   Albumin                     583 non-null    float64
 9   Albumin_and_Globulin_Ratio  579 non-null    float64
 10  Dataset                     583 non-null    int64  
dtypes: float64(5), int64(5), object(1)
memory usage: 50.2+ KB
Index(['Age', 'Gender', 'Total_Bilirubin', 'Direct_Bilirubin',
       'Alkaline_Phosphotase', 'Alamine_Aminotransferase',
       'Aspartate_Aminotransferase', 'Total_Protiens', 'Albumin',
       'Albumin_and_Globulin_Ratio', 'Dataset'],
      dtype='object')
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
578 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2
579 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
580 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
581 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
582 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2

583 rows × 11 columns

Creating a copy of the dataframe¶

Data Cleaning¶

Checking for the null values¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 False False False False False False False False False False False
1 False False False False False False False False False False False
2 False False False False False False False False False False False
3 False False False False False False False False False False False
4 False False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ...
578 False False False False False False False False False False False
579 False False False False False False False False False False False
580 False False False False False False False False False False False
581 False False False False False False False False False False False
582 False False False False False False False False False False False

583 rows × 11 columns

Age                           0
Gender                        0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    4
Dataset                       0
dtype: int64
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
209 45 Female 0.9 0.3 189 23 33 6.6 3.9 NaN 1
241 51 Male 0.8 0.2 230 24 46 6.5 3.1 NaN 1
253 35 Female 0.6 0.2 180 12 15 5.2 2.7 NaN 2
312 27 Male 1.3 0.6 106 25 54 8.5 4.8 NaN 2
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
578 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2
579 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
580 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
581 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
582 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2

579 rows × 11 columns

Age                           0
Gender                        0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    0
Dataset                       0
dtype: int64

Checking for negative values¶

Age                           0
Total_Bilirubin               0
Direct_Bilirubin              0
Alkaline_Phosphotase          0
Alamine_Aminotransferase      0
Aspartate_Aminotransferase    0
Total_Protiens                0
Albumin                       0
Albumin_and_Globulin_Ratio    0
Dataset                       0
dtype: int64

Checking for duplicate values¶

Viewing the duplicate values¶

13
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
19 40 Female 0.9 0.3 293 232 245 6.8 3.1 0.80 1
26 34 Male 4.1 2.0 289 875 731 5.0 2.7 1.10 1
34 38 Female 2.6 1.2 410 59 57 5.6 3.0 0.80 2
55 42 Male 8.9 4.5 272 31 61 5.8 2.0 0.50 1
62 58 Male 1.0 0.5 158 37 43 7.2 3.6 1.00 1
106 36 Male 5.3 2.3 145 32 92 5.1 2.6 1.00 2
108 36 Male 0.8 0.2 158 29 39 6.0 2.2 0.50 2
138 18 Male 0.8 0.2 282 72 140 5.5 2.5 0.80 1
143 30 Male 1.6 0.4 332 84 139 5.6 2.7 0.90 1
158 72 Male 0.7 0.1 196 20 35 5.8 2.0 0.50 1
164 39 Male 1.9 0.9 180 42 62 7.4 4.3 1.38 1
174 31 Male 0.6 0.1 175 48 34 6.0 3.7 1.60 1
201 49 Male 0.6 0.1 218 50 53 5.0 2.4 0.90 1

Removing duplicate values¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
561 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2
562 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
563 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
564 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
565 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2

566 rows × 11 columns

(566, 11)
Index(['Age', 'Gender', 'Total_Bilirubin', 'Direct_Bilirubin',
       'Alkaline_Phosphotase', 'Alamine_Aminotransferase',
       'Aspartate_Aminotransferase', 'Total_Protiens', 'Albumin',
       'Albumin_and_Globulin_Ratio', 'Dataset'],
      dtype='object')

Dealing with outliers¶

Creating Boxplot for viewing the outliers in Total Bilirubin and Direct Bilirubin¶

Text(0.5, 0.98, 'Boxplot of Total Bilirubin | Direct Bilirubin')

Creating Boxplot for viewing the outliers in Total Protiens, Albumin, Albumin and Globulin Ratio¶

Text(0.5, 0.98, 'Boxplot of Total Protiens | Albumin | Albumin and Globulin Ratio')

Creating Boxplot for viewing the outliers in Aspartate Aminotransferase, Alamine Aminotransferase, and Alkaline Phosphotase¶

Text(0.5, 0.98, 'Boxplot of Aspartate Aminotransferase | Alamine Aminotransferase | Alkaline Phosphotase')

Converting Dataset of 1 and 2 into Patient with liver disease and Patient with no liver disease respectively¶

Dataset Column

  • 1 - Patient with liver disease
  • 2 - Patient with no disease
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Details
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 Patient with liver disease
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 Patient with liver disease
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 Patient with liver disease
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 Patient with liver disease
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 Patient with liver disease

Converting Gender into F:0 and M:1 and creating a new column called Gender_Binary¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Details Gender_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 Patient with liver disease 0
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 Patient with liver disease 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 Patient with liver disease 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 Patient with liver disease 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 Patient with liver disease 1

Exploratory Data Analysis (EDA)¶

Idenfiying the numerical columns¶

['Age',
 'Total_Bilirubin',
 'Direct_Bilirubin',
 'Alkaline_Phosphotase',
 'Alamine_Aminotransferase',
 'Aspartate_Aminotransferase',
 'Total_Protiens',
 'Albumin',
 'Albumin_and_Globulin_Ratio']
Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio
0 65 0.7 0.1 187 16 18 6.8 3.3 0.90
1 62 10.9 5.5 699 64 100 7.5 3.2 0.74
2 62 7.3 4.1 490 60 68 7.0 3.3 0.89
3 58 1.0 0.4 182 14 20 6.8 3.4 1.00
4 72 3.9 2.0 195 27 59 7.3 2.4 0.40
... ... ... ... ... ... ... ... ... ...
561 60 0.5 0.1 500 20 34 5.9 1.6 0.37
562 40 0.6 0.1 98 35 31 6.0 3.2 1.10
563 52 0.8 0.2 245 48 49 6.4 3.2 1.00
564 31 1.3 0.5 184 29 32 6.8 3.4 1.00
565 38 1.0 0.3 216 21 24 7.3 4.4 1.50

566 rows × 9 columns

Idenfiying the categorical columns¶

['Gender', 'Dataset']
Gender Dataset
0 Female 1
1 Male 1
2 Male 1
3 Male 1
4 Male 1
... ... ...
561 Male 2
562 Male 1
563 Male 1
564 Male 1
565 Male 2

566 rows × 2 columns

Generating descriptive statistics¶

count mean std min 25% 50% 75% max
Age 566.0 44.886926 16.274893 4.0 33.0 45.00 58.00 90.0
Total_Bilirubin 566.0 3.338869 6.286728 0.4 0.8 1.00 2.60 75.0
Direct_Bilirubin 566.0 1.505830 2.841485 0.1 0.2 0.30 1.30 19.7
Alkaline_Phosphotase 566.0 292.567138 245.936559 63.0 176.0 208.00 298.00 2110.0
Alamine_Aminotransferase 566.0 80.143110 182.044881 10.0 23.0 35.00 60.75 2000.0
Aspartate_Aminotransferase 566.0 109.892226 291.841897 10.0 25.0 41.00 87.00 4929.0
Total_Protiens 566.0 6.494876 1.087512 2.7 5.8 6.60 7.20 9.6
Albumin 566.0 3.145583 0.795745 0.9 2.6 3.10 3.80 5.5
Albumin_and_Globulin_Ratio 566.0 0.948004 0.319635 0.3 0.7 0.95 1.10 2.8
Dataset 566.0 1.286219 0.452393 1.0 1.0 1.00 2.00 2.0
Gender_Binary 566.0 0.756184 0.429763 0.0 1.0 1.00 1.00 1.0

Checking for correlation¶

Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
Age 1.000000 0.011763 0.007529 0.080425 -0.086883 -0.019910 -0.187461 -0.265924 -0.216408 -0.137351
Total_Bilirubin 0.011763 1.000000 0.874618 0.206669 0.214065 0.237831 -0.008099 -0.222250 -0.206267 -0.220208
Direct_Bilirubin 0.007529 0.874618 1.000000 0.234939 0.233894 0.257544 -0.000139 -0.228531 -0.200125 -0.246046
Alkaline_Phosphotase 0.080425 0.206669 0.234939 1.000000 0.125680 0.167196 -0.028514 -0.165453 -0.234166 -0.184866
Alamine_Aminotransferase -0.086883 0.214065 0.233894 0.125680 1.000000 0.791966 -0.042518 -0.029742 -0.002375 -0.163416
Aspartate_Aminotransferase -0.019910 0.237831 0.257544 0.167196 0.791966 1.000000 -0.025645 -0.085290 -0.070040 -0.151934
Total_Protiens -0.187461 -0.008099 -0.000139 -0.028514 -0.042518 -0.025645 1.000000 0.784053 0.234887 0.035008
Albumin -0.265924 -0.222250 -0.228531 -0.165453 -0.029742 -0.085290 0.784053 1.000000 0.689632 0.161388
Albumin_and_Globulin_Ratio -0.216408 -0.206267 -0.200125 -0.234166 -0.002375 -0.070040 0.234887 0.689632 1.000000 0.163131
Dataset -0.137351 -0.220208 -0.246046 -0.184866 -0.163416 -0.151934 0.035008 0.161388 0.163131 1.000000

Correlation heatmap¶

Generating histogram plot as per the columns¶

Individual/Seperate histogram¶

Histogram as per the age group¶

Text(0.5, 1.0, 'Histogram plot to represent the number of different age group of peoples')

Histogram regarding the rate of Alkaline Phosphotase¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Alkaline Phosphotase among the peoples')

Histogram regarding the rate of Aspartate Aminotransferase¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Aspartate Aminotransferase among the peoples')

Histogram regarding the rate of Alamine Aminotransferase¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Alamine Aminotransferase among the peoples')

Histogram regarding the rate of Total Protiens¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Total Protiens among the peoples')

Histogram regarding the rate of Albumin¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Albumin among the peoples')

Histogram regarding the rate of Albumin and Globulin Ratio¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Albumin and Globulin Ratio among the peoples')

Histogram regarding the rate of Total Bilirubin¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Total Bilirubin among the peoples')

Histogram regarding the rate of Direct_Bilirubin¶

Text(0.5, 1.0, 'Histogram plot to represent the rate of Direct Bilirubin among the peoples')

Data Visualization¶

Creating the pariplot according to the liver and non liver patients¶

<seaborn.axisgrid.PairGrid at 0x25bbba93a90>

Creating the pairplot according to the gender¶

<seaborn.axisgrid.PairGrid at 0x25bc0571f70>

The pairplot comprises two figures namely, the histogram and the scatter plot. Histogram can be used to view the distribution of a single variable. Likewise, the scatter plots on the upper and lower triangles are used to view the relationship between two variables.

To learn more about pairplots, Click Here!!!

Total number of male and female present in the dataset¶

Text(0.5, 1.0, 'Pie-chart representing number of male and female present in the dataset')

Total number of patients with liver diseases and without liver disease in the dataset¶

Text(0.5, 1.0, 'Pie-chart representing number of patients with liver diseases and without liver disease in the dataset')
'plt.pie(liver_df.groupby(\'Dataset\')[\'Dataset\'].count(), \n        explode=[0,0.3], \n        labels=liver_df.groupby(\'Dataset\')[\'Dataset\'].count().index,\n        colors=["#996699","#ff9999"],\n        autopct="%.2f%%",\n        radius=1.2,\n        shadow = True\n       )#grouping data as per the dataset column and counting the number of data as per dataset column for plotting a pie-chart \nplt.title("Pie-chart representing number of patients with liver diseases and without liver disease in the dataset") #defining title for the figure'

Patients with liver diseases and without liver disease as per gender¶

Patients with and without liver disease according to the age¶

Top 10 age group patients having liver disease¶

Age Count
50 60 26
23 32 18
36 45 18
41 50 17
37 46 16
39 48 16
31 40 15
46 55 15
64 75 14
33 42 13

Top 10 age group patients who do not have the liver disease¶

Age Count
51 65 10
27 38 9
46 60 8
31 42 7
34 45 6
25 36 6
38 50 6
37 49 5
24 35 5
8 17 4

Top 10 age group patients having liver disease VS Top 10 age group patients who do not have the liver disease¶

Text(0.5, 0.98, 'Bar Graph of top 10 count of patients with and without the liver disease as per the age group')

Top 20 lowest count of age group patients having the liver disease¶

Age Count
35 44 2
34 43 2
51 61 2
57 67 1
65 78 1
59 69 1
0 7 1
53 63 1
47 56 1
1 8 1
18 27 1
14 23 1
11 20 1
10 19 1
8 17 1
6 15 1
5 14 1
3 12 1
2 10 1
66 90 1

Top 20 lowest count of paitents without the liver disease as per age group¶

Age Count
22 32 2
49 63 1
44 57 1
53 69 1
55 72 1
56 84 1
29 40 1
41 54 1
35 47 1
33 44 1
1 6 1
28 39 1
10 19 1
7 16 1
6 14 1
5 13 1
4 12 1
3 11 1
2 7 1
57 85 1

Top 20 lowest count of age group patients having the liver disease VS Top 20 lowest count of paitents without the liver disease as per age group¶

Text(0.5, 0.98, 'Bar Graph of top 20 lowest count of patients with and without the liver disease as per the age group')

Further analysis and Data visualization¶

Note:

  • 1 represents Normal level
  • 0 represents Not Normal level

Further analysis and Data Visualization on Total Bilirubin¶

Creating function as per the recommended rate of Total Bilirubin¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Age                         566 non-null    int64  
 1   Gender                      566 non-null    object 
 2   Total_Bilirubin             566 non-null    float64
 3   Direct_Bilirubin            566 non-null    float64
 4   Alkaline_Phosphotase        566 non-null    int64  
 5   Alamine_Aminotransferase    566 non-null    int64  
 6   Aspartate_Aminotransferase  566 non-null    int64  
 7   Total_Protiens              566 non-null    float64
 8   Albumin                     566 non-null    float64
 9   Albumin_and_Globulin_Ratio  566 non-null    float64
 10  Dataset                     566 non-null    int64  
 11  Dataset_Details             566 non-null    object 
 12  Gender_Binary               566 non-null    int64  
dtypes: float64(5), int64(6), object(2)
memory usage: 57.6+ KB
0       0.7
1      10.9
2       7.3
3       1.0
4       3.9
       ... 
561     0.5
562     0.6
563     0.8
564     1.3
565     1.0
Name: Total_Bilirubin, Length: 566, dtype: float64
Total_Bilirubin Total_Bilirubin_Binary
0 0.7 1
1 10.9 0
2 7.3 0
3 1.0 1
4 3.9 0
... ... ...
561 0.5 1
562 0.6 1
563 0.8 1
564 1.3 0
565 1.0 1

566 rows × 2 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 14 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Age                         566 non-null    int64  
 1   Gender                      566 non-null    object 
 2   Total_Bilirubin             566 non-null    float64
 3   Direct_Bilirubin            566 non-null    float64
 4   Alkaline_Phosphotase        566 non-null    int64  
 5   Alamine_Aminotransferase    566 non-null    int64  
 6   Aspartate_Aminotransferase  566 non-null    int64  
 7   Total_Protiens              566 non-null    float64
 8   Albumin                     566 non-null    float64
 9   Albumin_and_Globulin_Ratio  566 non-null    float64
 10  Dataset                     566 non-null    int64  
 11  Dataset_Details             566 non-null    object 
 12  Gender_Binary               566 non-null    int64  
 13  Total_Bilirubin_Binary      566 non-null    int64  
dtypes: float64(5), int64(7), object(2)
memory usage: 62.0+ KB
Total_Bilirubin Total_Bilirubin_Binary Total_Bilirubin_Description
0 0.7 1 Normal
1 10.9 0 Not in Normal Range
2 7.3 0 Not in Normal Range
3 1.0 1 Normal
4 3.9 0 Not in Normal Range
... ... ... ...
561 0.5 1 Normal
562 0.6 1 Normal
563 0.8 1 Normal
564 1.3 0 Not in Normal Range
565 1.0 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Age                          566 non-null    int64  
 1   Gender                       566 non-null    object 
 2   Total_Bilirubin              566 non-null    float64
 3   Direct_Bilirubin             566 non-null    float64
 4   Alkaline_Phosphotase         566 non-null    int64  
 5   Alamine_Aminotransferase     566 non-null    int64  
 6   Aspartate_Aminotransferase   566 non-null    int64  
 7   Total_Protiens               566 non-null    float64
 8   Albumin                      566 non-null    float64
 9   Albumin_and_Globulin_Ratio   566 non-null    float64
 10  Dataset                      566 non-null    int64  
 11  Dataset_Details              566 non-null    object 
 12  Gender_Binary                566 non-null    int64  
 13  Total_Bilirubin_Binary       566 non-null    int64  
 14  Total_Bilirubin_Description  566 non-null    object 
dtypes: float64(5), int64(7), object(3)
memory usage: 66.5+ KB

Peoples as per the age whose Total Bilirubin is not in the the recommended level¶

Age
7      1
13     1
14     1
16     2
18     3
19     1
20     2
21     2
22     5
23     2
24     1
26     8
27     1
29     1
30     2
31     2
32    13
33    10
34     5
35     4
36     4
37     3
38    10
39     4
40     9
41     3
42     8
43     2
44     1
45    14
46    10
47     3
48     9
49     4
50    12
51     4
52     2
53     2
54     5
55    10
56     2
57     4
58     4
60    27
61     1
62     6
64     4
65     6
66     7
67     1
68     1
70     6
72     3
73     2
74     1
75     9
90     1
Name: Total_Bilirubin_Binary, dtype: int64
<matplotlib.legend.Legend at 0x25bca1c9c40>

Top 10 age group people having Total Bilrubin in not recommended level¶

Age Count
43 60 27
29 45 14
16 32 13
34 50 12
39 55 10
30 46 10
22 38 10
17 33 10
24 40 9
55 75 9
<matplotlib.legend.Legend at 0x25bca541a90>

Peoples as per the age having normal Total Bilirubin level¶

Age
4     2
6     1
7     1
8     1
10    1
     ..
74    3
75    5
78    1
84    1
85    1
Name: Total_Bilirubin_Binary, Length: 69, dtype: int64
<matplotlib.legend.Legend at 0x25bca511d30>

People having Total Bilirubin not in recommended level as per the gender¶

Gender
Female     36
Male      235
Name: Total_Bilirubin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Builirubin in recommended level as per the gender')

Peoples having normal Total Bilirubin level as per the gender¶

Gender
Female    102
Male      193
Name: Total_Bilirubin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Builirubin in recommended level as per the gender')
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Details Gender_Binary Total_Bilirubin_Binary Total_Bilirubin_Description
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 Patient with liver disease 0 1 Normal
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 Patient with liver disease 1 0 Not in Normal Range
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 Patient with liver disease 1 0 Not in Normal Range
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 Patient with liver disease 1 1 Normal
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 Patient with liver disease 1 0 Not in Normal Range
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
561 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2 Patient with no liver disease 1 1 Normal
562 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1 Patient with liver disease 1 1 Normal
563 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1 Patient with liver disease 1 1 Normal
564 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1 Patient with liver disease 1 0 Not in Normal Range
565 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2 Patient with no liver disease 1 1 Normal

566 rows × 15 columns

Normal level Total Bilrubin vs Total Bilrubin not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Total Bilrubin vs Total Bilrubin not in normal level as per the gender')

The condition of liver of the patient according to the Total Bilrubin rate¶

People who have Total Bilrubin in normal level and have liver disease¶

Gender
Female     60
Male      116
Name: Total_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Builirubin in recommended level and have liver disease as per the gender')

People who have Total Bilrubin in normal level and do not have liver disease¶

Gender
Female    42
Male      77
Name: Total_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Builirubin in recommended level and do not have liver disease as per the gender')

Total Bilrubin in normal level and having liver disease VS Total Bilrubin in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Total Bilrubin in normal level and having liver disease VS  Total Bilrubin in normal level and not having a liver disease')

People who do not have Total Bilrubin in normal level and have liver diesease¶

Gender
Female     30
Male      198
Name: Total_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Bilrubin in normal level and have liver diesease as per the gender')

People who do not have Total Bilrubin in normal level and do not have liver diesease as well¶

Gender
Female     6
Male      37
Name: Total_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Bilrubin in normal level and do not have liver diesease as per the gender')

Total Bilrubin not in normal level and having liver disease VS Total Bilrubin not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Total Bilrubin not in normal level and having liver disease VS  Total Bilrubin not in normal level and not having a liver disease')

Further analysis and Data Visualization on Direct Bilirubin¶

Creating function as per the recommended rate of Direct Bilirubin¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Age                          566 non-null    int64  
 1   Gender                       566 non-null    object 
 2   Total_Bilirubin              566 non-null    float64
 3   Direct_Bilirubin             566 non-null    float64
 4   Alkaline_Phosphotase         566 non-null    int64  
 5   Alamine_Aminotransferase     566 non-null    int64  
 6   Aspartate_Aminotransferase   566 non-null    int64  
 7   Total_Protiens               566 non-null    float64
 8   Albumin                      566 non-null    float64
 9   Albumin_and_Globulin_Ratio   566 non-null    float64
 10  Dataset                      566 non-null    int64  
 11  Dataset_Details              566 non-null    object 
 12  Gender_Binary                566 non-null    int64  
 13  Total_Bilirubin_Binary       566 non-null    int64  
 14  Total_Bilirubin_Description  566 non-null    object 
dtypes: float64(5), int64(7), object(3)
memory usage: 66.5+ KB
0      0.1
1      5.5
2      4.1
3      0.4
4      2.0
      ... 
561    0.1
562    0.1
563    0.2
564    0.5
565    0.3
Name: Direct_Bilirubin, Length: 566, dtype: float64
Direct_Bilirubin Direct_Bilirubin_Binary
0 0.1 1
1 5.5 0
2 4.1 0
3 0.4 1
4 2.0 0
... ... ...
561 0.1 1
562 0.1 1
563 0.2 1
564 0.5 0
565 0.3 1

566 rows × 2 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 16 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Age                          566 non-null    int64  
 1   Gender                       566 non-null    object 
 2   Total_Bilirubin              566 non-null    float64
 3   Direct_Bilirubin             566 non-null    float64
 4   Alkaline_Phosphotase         566 non-null    int64  
 5   Alamine_Aminotransferase     566 non-null    int64  
 6   Aspartate_Aminotransferase   566 non-null    int64  
 7   Total_Protiens               566 non-null    float64
 8   Albumin                      566 non-null    float64
 9   Albumin_and_Globulin_Ratio   566 non-null    float64
 10  Dataset                      566 non-null    int64  
 11  Dataset_Details              566 non-null    object 
 12  Gender_Binary                566 non-null    int64  
 13  Total_Bilirubin_Binary       566 non-null    int64  
 14  Total_Bilirubin_Description  566 non-null    object 
 15  Direct_Bilirubin_Binary      566 non-null    int64  
dtypes: float64(5), int64(8), object(3)
memory usage: 70.9+ KB
Direct_Bilirubin Direct_Bilirubin_Binary Direct_Bilirubin_Description
0 0.1 1 Normal
1 5.5 0 Not in Normal Range
2 4.1 0 Not in Normal Range
3 0.4 1 Normal
4 2.0 0 Not in Normal Range
... ... ... ...
561 0.1 1 Normal
562 0.1 1 Normal
563 0.2 1 Normal
564 0.5 0 Not in Normal Range
565 0.3 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Age                           566 non-null    int64  
 1   Gender                        566 non-null    object 
 2   Total_Bilirubin               566 non-null    float64
 3   Direct_Bilirubin              566 non-null    float64
 4   Alkaline_Phosphotase          566 non-null    int64  
 5   Alamine_Aminotransferase      566 non-null    int64  
 6   Aspartate_Aminotransferase    566 non-null    int64  
 7   Total_Protiens                566 non-null    float64
 8   Albumin                       566 non-null    float64
 9   Albumin_and_Globulin_Ratio    566 non-null    float64
 10  Dataset                       566 non-null    int64  
 11  Dataset_Details               566 non-null    object 
 12  Gender_Binary                 566 non-null    int64  
 13  Total_Bilirubin_Binary        566 non-null    int64  
 14  Total_Bilirubin_Description   566 non-null    object 
 15  Direct_Bilirubin_Binary       566 non-null    int64  
 16  Direct_Bilirubin_Description  566 non-null    object 
dtypes: float64(5), int64(8), object(4)
memory usage: 75.3+ KB

Peoples as per the age whose Direct Bilirubin is not in the the recommended level¶

Age
7      1
13     1
14     1
16     2
18     3
19     1
20     2
21     2
22     4
23     2
24     1
26     7
31     1
32    13
33    10
34     5
35     4
36     3
37     2
38     8
39     4
40     8
41     3
42     8
43     2
44     1
45    13
46     9
47     3
48    10
49     3
50    11
51     4
52     2
53     2
54     5
55     9
56     2
57     3
58     5
60    27
61     1
62     5
64     3
65     5
66     7
67     1
68     1
70     4
72     3
73     2
75     8
Name: Direct_Bilirubin_Binary, dtype: int64
<matplotlib.legend.Legend at 0x25bca818220>

Top 10 age group people having Direct Bilrubin in not recommended level¶

Age Count
40 60 27
26 45 13
13 32 13
31 50 11
29 48 10
14 33 10
36 55 9
27 46 9
23 42 8
21 40 8
<matplotlib.legend.Legend at 0x25bc8615970>

Peoples as per the age having normal Direct Bilirubin level¶

Age
4     2
6     1
7     1
8     1
10    1
     ..
75    6
78    1
84    1
85    1
90    1
Name: Direct_Bilirubin_Binary, Length: 70, dtype: int64
<matplotlib.legend.Legend at 0x25bb9e8cfd0>

People having Direct Bilirubin not in recommended level as per the gender¶

Gender
Female     33
Male      214
Name: Direct_Bilirubin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Direct Builirubin in recommended level as per the gender')

Peoples having normal Direct Bilirubin level as per the gender¶

Gender
Female    105
Male      214
Name: Direct_Bilirubin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Direct Builirubin in recommended level as per the gender')
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Details Gender_Binary Total_Bilirubin_Binary Total_Bilirubin_Description Direct_Bilirubin_Binary Direct_Bilirubin_Description
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 Patient with liver disease 0 1 Normal 1 Normal
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 Patient with liver disease 1 1 Normal 1 Normal
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
561 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2 Patient with no liver disease 1 1 Normal 1 Normal
562 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1 Patient with liver disease 1 1 Normal 1 Normal
563 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1 Patient with liver disease 1 1 Normal 1 Normal
564 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range
565 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2 Patient with no liver disease 1 1 Normal 1 Normal

566 rows × 17 columns

Normal level Direct Bilrubin vs Direct Bilrubin not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Direct Bilrubin VS Direct Bilrubin not in normal level as per the gender')

The condition of liver of the patient according to the Direct Bilrubin rate¶

People who have Direct Bilrubin in normal level and have liver disease¶

Gender
Female     61
Male      130
Name: Direct_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Direct Builirubin in recommended level and have liver disease as per the gender')

People who have Direct Bilrubin in normal level and do not have liver disease¶

Gender
Female    44
Male      84
Name: Direct_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Direct Builirubin in recommended level and do not have liver disease as per the gender')

Direct Bilrubin in normal level and having liver disease VS Direct Bilrubin in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Direct Bilrubin in normal level and having liver disease VS  Direct Bilrubin in normal level and not having a liver disease')

People who do not have Direct Bilrubin in normal level and have liver diesease¶

Gender
Female     29
Male      184
Name: Direct_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Direct Bilrubin in normal level and have liver diesease as per the gender')

People who do not have Direct Bilrubin in normal level and do not have liver diesease as well¶

Gender
Female     4
Male      30
Name: Direct_Bilirubin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Direct Bilrubin in normal level and do not have liver diesease as per the gender')

Direct Bilrubin not in normal level and having liver disease VS Direct Bilrubin not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Direct Bilrubin not in normal level and having liver disease VS  Direct Bilrubin not in normal level and not having a liver disease')

Further analysis and Data Visualization on Alkaline Phosphatase (ALP)¶

Creating function as per the recommended rate of Alkaline Phosphotase¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Age                           566 non-null    int64  
 1   Gender                        566 non-null    object 
 2   Total_Bilirubin               566 non-null    float64
 3   Direct_Bilirubin              566 non-null    float64
 4   Alkaline_Phosphotase          566 non-null    int64  
 5   Alamine_Aminotransferase      566 non-null    int64  
 6   Aspartate_Aminotransferase    566 non-null    int64  
 7   Total_Protiens                566 non-null    float64
 8   Albumin                       566 non-null    float64
 9   Albumin_and_Globulin_Ratio    566 non-null    float64
 10  Dataset                       566 non-null    int64  
 11  Dataset_Details               566 non-null    object 
 12  Gender_Binary                 566 non-null    int64  
 13  Total_Bilirubin_Binary        566 non-null    int64  
 14  Total_Bilirubin_Description   566 non-null    object 
 15  Direct_Bilirubin_Binary       566 non-null    int64  
 16  Direct_Bilirubin_Description  566 non-null    object 
dtypes: float64(5), int64(8), object(4)
memory usage: 75.3+ KB
0      187
1      699
2      490
3      182
4      195
      ... 
561    500
562     98
563    245
564    184
565    216
Name: Alkaline_Phosphotase, Length: 566, dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 18 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Age                           566 non-null    int64  
 1   Gender                        566 non-null    object 
 2   Total_Bilirubin               566 non-null    float64
 3   Direct_Bilirubin              566 non-null    float64
 4   Alkaline_Phosphotase          566 non-null    int64  
 5   Alamine_Aminotransferase      566 non-null    int64  
 6   Aspartate_Aminotransferase    566 non-null    int64  
 7   Total_Protiens                566 non-null    float64
 8   Albumin                       566 non-null    float64
 9   Albumin_and_Globulin_Ratio    566 non-null    float64
 10  Dataset                       566 non-null    int64  
 11  Dataset_Details               566 non-null    object 
 12  Gender_Binary                 566 non-null    int64  
 13  Total_Bilirubin_Binary        566 non-null    int64  
 14  Total_Bilirubin_Description   566 non-null    object 
 15  Direct_Bilirubin_Binary       566 non-null    int64  
 16  Direct_Bilirubin_Description  566 non-null    object 
 17  Alkaline_Phosphotase_Binary   566 non-null    int64  
dtypes: float64(5), int64(9), object(4)
memory usage: 79.7+ KB
Alkaline_Phosphotase Alkaline_Phosphotase_Binary
0 187 0
1 699 0
2 490 0
3 182 0
4 195 0
... ... ...
561 500 0
562 98 1
563 245 0
564 184 0
565 216 0

566 rows × 2 columns

Alkaline_Phosphotase Alkaline_Phosphotase_Binary Alkaline_Phosphotase_Description
0 187 0 Not in Normal Range
1 699 0 Not in Normal Range
2 490 0 Not in Normal Range
3 182 0 Not in Normal Range
4 195 0 Not in Normal Range
... ... ... ...
561 500 0 Not in Normal Range
562 98 1 Normal
563 245 0 Not in Normal Range
564 184 0 Not in Normal Range
565 216 0 Not in Normal Range

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 19 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Age                               566 non-null    int64  
 1   Gender                            566 non-null    object 
 2   Total_Bilirubin                   566 non-null    float64
 3   Direct_Bilirubin                  566 non-null    float64
 4   Alkaline_Phosphotase              566 non-null    int64  
 5   Alamine_Aminotransferase          566 non-null    int64  
 6   Aspartate_Aminotransferase        566 non-null    int64  
 7   Total_Protiens                    566 non-null    float64
 8   Albumin                           566 non-null    float64
 9   Albumin_and_Globulin_Ratio        566 non-null    float64
 10  Dataset                           566 non-null    int64  
 11  Dataset_Details                   566 non-null    object 
 12  Gender_Binary                     566 non-null    int64  
 13  Total_Bilirubin_Binary            566 non-null    int64  
 14  Total_Bilirubin_Description       566 non-null    object 
 15  Direct_Bilirubin_Binary           566 non-null    int64  
 16  Direct_Bilirubin_Description      566 non-null    object 
 17  Alkaline_Phosphotase_Binary       566 non-null    int64  
 18  Alkaline_Phosphotase_Description  566 non-null    object 
dtypes: float64(5), int64(9), object(5)
memory usage: 84.1+ KB

Peoples as per the age whose Alkaline Phosphatase (ALP) is not in the the recommended level¶

Age
4      2
6      1
7      2
8      1
10     1
      ..
75    14
78     1
84     1
85     1
90     1
Name: Alkaline_Phosphotase_Binary, Length: 72, dtype: int64
<matplotlib.legend.Legend at 0x25bcb210f10>

Top 10 age group people having Alkaline Phosphotase (ALP) in not recommended level¶

Age Count
53 60 34
39 45 22
44 50 21
26 32 19
36 42 18
32 38 17
49 55 17
42 48 17
58 65 16
40 46 15
<matplotlib.legend.Legend at 0x25bcbb139d0>

Peoples as per the age having normal Alkaline Phosphotase (ALP) level¶

Age
17    1
20    1
21    3
22    1
25    2
26    5
28    2
29    1
30    1
32    1
33    1
35    1
36    1
37    3
38    3
40    1
41    1
42    2
43    2
45    2
46    1
48    3
49    1
50    2
52    1
55    1
56    1
58    3
61    1
62    1
64    1
65    1
66    1
68    1
69    1
70    1
72    2
Name: Alkaline_Phosphotase_Binary, dtype: int64
<matplotlib.legend.Legend at 0x25bcbac72b0>

People having Alkaline Phosphotase (ALP) not in recommended level as per the gender¶

Gender
Female    122
Male      386
Name: Alkaline_Phosphotase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alkaline Phosphotase in recommended level as per the gender')

Peoples having normal Alkaline Phosphotase (ALP) level as per the gender¶

Gender
Female    16
Male      42
Name: Alkaline_Phosphotase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alkaline Phosphotase (ALP) in recommended level as per the gender')
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Details Gender_Binary Total_Bilirubin_Binary Total_Bilirubin_Description Direct_Bilirubin_Binary Direct_Bilirubin_Description Alkaline_Phosphotase_Binary Alkaline_Phosphotase_Description
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 Patient with liver disease 0 1 Normal 1 Normal 0 Not in Normal Range
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 Patient with liver disease 1 1 Normal 1 Normal 0 Not in Normal Range
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
561 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2 Patient with no liver disease 1 1 Normal 1 Normal 0 Not in Normal Range
562 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1 Patient with liver disease 1 1 Normal 1 Normal 1 Normal
563 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1 Patient with liver disease 1 1 Normal 1 Normal 0 Not in Normal Range
564 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1 Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
565 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2 Patient with no liver disease 1 1 Normal 1 Normal 0 Not in Normal Range

566 rows × 19 columns

Normal level Alkaline Phosphotase (ALP) vs Alkaline Phosphotase (ALP) not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Alkaline Phosphotase (ALP) vs Alkaline Phosphotase (ALP) not in normal level as per the gender')

The condition of liver of the patient according to the Alkaline Phosphotase (ALP) rate¶

People who have Alkaline Phosphotase (ALP) in normal level and have liver disease¶

Gender
Female     8
Male      28
Name: Alkaline_Phosphotase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alkaline Phosphotase (ALP) in recommended level and have liver disease as per the gender')

People who have Alkaline Phosphotase (ALP) in normal level and do not have liver disease¶

Gender
Female     8
Male      14
Name: Alkaline_Phosphotase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alkaline Phosphotase in recommended level and do not have liver disease as per the gender')

Alkaline Phosphotase (ALP) in normal level and having liver disease VS Alkaline Phosphotase (ALP) in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Alkaline Phosphotase (ALP) in normal level and having liver disease VS Alkaline Phosphotase (ALP) in normal level and not having a liver disease')

People who do not have Alkaline Phosphotase (ALP) in normal level and have liver diesease¶

Gender
Female     82
Male      286
Name: Alkaline_Phosphotase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alkaline Phosphotase in normal level and have liver diesease as per the gender')

People who do not have Alkaline Phosphotase (ALP) in normal level and do not have liver disease as well¶

Gender
Female     40
Male      100
Name: Alkaline_Phosphotase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alkaline Phosphotase (ALP) in normal level and do not have liver disease as per the gender')

Alkaline Phosphotase (ALP) not in normal level and having liver disease VS Alkaline Phosphotase (ALP) not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Alkaline Phosphotase (ALP) not in normal level and having liver disease VS Alkaline Phosphotase (ALP) not in normal level and not having a liver disease')

Further analysis and Data Visualization on Sgpt Alamine aminotransferase (ALT)¶

Creating function as per the recommended rate of Alamine Aminotransferase¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 19 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Age                               566 non-null    int64  
 1   Gender                            566 non-null    object 
 2   Total_Bilirubin                   566 non-null    float64
 3   Direct_Bilirubin                  566 non-null    float64
 4   Alkaline_Phosphotase              566 non-null    int64  
 5   Alamine_Aminotransferase          566 non-null    int64  
 6   Aspartate_Aminotransferase        566 non-null    int64  
 7   Total_Protiens                    566 non-null    float64
 8   Albumin                           566 non-null    float64
 9   Albumin_and_Globulin_Ratio        566 non-null    float64
 10  Dataset                           566 non-null    int64  
 11  Dataset_Details                   566 non-null    object 
 12  Gender_Binary                     566 non-null    int64  
 13  Total_Bilirubin_Binary            566 non-null    int64  
 14  Total_Bilirubin_Description       566 non-null    object 
 15  Direct_Bilirubin_Binary           566 non-null    int64  
 16  Direct_Bilirubin_Description      566 non-null    object 
 17  Alkaline_Phosphotase_Binary       566 non-null    int64  
 18  Alkaline_Phosphotase_Description  566 non-null    object 
dtypes: float64(5), int64(9), object(5)
memory usage: 84.1+ KB
0      16
1      64
2      60
3      14
4      27
       ..
561    20
562    35
563    48
564    29
565    21
Name: Alamine_Aminotransferase, Length: 566, dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 20 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Age                               566 non-null    int64  
 1   Gender                            566 non-null    object 
 2   Total_Bilirubin                   566 non-null    float64
 3   Direct_Bilirubin                  566 non-null    float64
 4   Alkaline_Phosphotase              566 non-null    int64  
 5   Alamine_Aminotransferase          566 non-null    int64  
 6   Aspartate_Aminotransferase        566 non-null    int64  
 7   Total_Protiens                    566 non-null    float64
 8   Albumin                           566 non-null    float64
 9   Albumin_and_Globulin_Ratio        566 non-null    float64
 10  Dataset                           566 non-null    int64  
 11  Dataset_Details                   566 non-null    object 
 12  Gender_Binary                     566 non-null    int64  
 13  Total_Bilirubin_Binary            566 non-null    int64  
 14  Total_Bilirubin_Description       566 non-null    object 
 15  Direct_Bilirubin_Binary           566 non-null    int64  
 16  Direct_Bilirubin_Description      566 non-null    object 
 17  Alkaline_Phosphotase_Binary       566 non-null    int64  
 18  Alkaline_Phosphotase_Description  566 non-null    object 
 19  Alamine_Aminotransferase_Binary   566 non-null    int64  
dtypes: float64(5), int64(10), object(5)
memory usage: 88.6+ KB
Alamine_Aminotransferase Alamine_Aminotransferase_Binary
0 16 1
1 64 0
2 60 0
3 14 1
4 27 1
... ... ...
561 20 1
562 35 1
563 48 0
564 29 1
565 21 1

566 rows × 2 columns

Alamine_Aminotransferase Alamine_Aminotransferase_Binary Alamine_Aminotransferase_Description
0 16 1 Normal
1 64 0 Not in Normal Range
2 60 0 Not in Normal Range
3 14 1 Normal
4 27 1 Normal
... ... ... ...
561 20 1 Normal
562 35 1 Normal
563 48 0 Not in Normal Range
564 29 1 Normal
565 21 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 21 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Age                                   566 non-null    int64  
 1   Gender                                566 non-null    object 
 2   Total_Bilirubin                       566 non-null    float64
 3   Direct_Bilirubin                      566 non-null    float64
 4   Alkaline_Phosphotase                  566 non-null    int64  
 5   Alamine_Aminotransferase              566 non-null    int64  
 6   Aspartate_Aminotransferase            566 non-null    int64  
 7   Total_Protiens                        566 non-null    float64
 8   Albumin                               566 non-null    float64
 9   Albumin_and_Globulin_Ratio            566 non-null    float64
 10  Dataset                               566 non-null    int64  
 11  Dataset_Details                       566 non-null    object 
 12  Gender_Binary                         566 non-null    int64  
 13  Total_Bilirubin_Binary                566 non-null    int64  
 14  Total_Bilirubin_Description           566 non-null    object 
 15  Direct_Bilirubin_Binary               566 non-null    int64  
 16  Direct_Bilirubin_Description          566 non-null    object 
 17  Alkaline_Phosphotase_Binary           566 non-null    int64  
 18  Alkaline_Phosphotase_Description      566 non-null    object 
 19  Alamine_Aminotransferase_Binary       566 non-null    int64  
 20  Alamine_Aminotransferase_Description  566 non-null    object 
dtypes: float64(5), int64(10), object(6)
memory usage: 93.0+ KB

Peoples as per the age whose Alamine Aminotransferase (ALT) is not in the the recommended level¶

Age
4     1
6     1
7     1
12    2
14    1
     ..
69    1
70    6
73    1
75    7
90    1
Name: Alamine_Aminotransferase_Binary, Length: 61, dtype: int64
<matplotlib.legend.Legend at 0x25bcf459280>

Top 10 age group people having Alamine Aminotransferase (ALT) in not recommended level¶

Age Count
47 60 23
21 32 16
33 45 12
27 38 11
43 55 10
38 50 10
29 40 10
15 26 9
36 48 9
22 33 8
<matplotlib.legend.Legend at 0x25bd13bb4c0>

Peoples as per the age having normal Alamine Aminotransferase (ALT) level¶

Age
4     1
7     1
8     1
10    1
11    1
     ..
74    4
75    7
78    1
84    1
85    1
Name: Alamine_Aminotransferase_Binary, Length: 68, dtype: int64
<matplotlib.legend.Legend at 0x25bd13fa130>

People having Alamine Aminotransferase (ALT) not in recommended level as per the gender¶

Gender
Female     45
Male      234
Name: Alamine_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alamine Aminotransferase (ALT) in recommended level as per the gender')

Peoples having normal Alamine Aminotransferase (ALT) level as per the gender¶

Gender
Female     93
Male      194
Name: Alamine_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alamine Aminotransferase (ALT) in recommended level as per the gender')
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio ... Dataset_Details Gender_Binary Total_Bilirubin_Binary Total_Bilirubin_Description Direct_Bilirubin_Binary Direct_Bilirubin_Description Alkaline_Phosphotase_Binary Alkaline_Phosphotase_Description Alamine_Aminotransferase_Binary Alamine_Aminotransferase_Description
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 ... Patient with liver disease 0 1 Normal 1 Normal 0 Not in Normal Range 1 Normal
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 ... Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 ... Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 ... Patient with liver disease 1 1 Normal 1 Normal 0 Not in Normal Range 1 Normal
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 ... Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range 1 Normal
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
561 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 ... Patient with no liver disease 1 1 Normal 1 Normal 0 Not in Normal Range 1 Normal
562 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 ... Patient with liver disease 1 1 Normal 1 Normal 1 Normal 1 Normal
563 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 ... Patient with liver disease 1 1 Normal 1 Normal 0 Not in Normal Range 0 Not in Normal Range
564 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 ... Patient with liver disease 1 0 Not in Normal Range 0 Not in Normal Range 0 Not in Normal Range 1 Normal
565 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 ... Patient with no liver disease 1 1 Normal 1 Normal 0 Not in Normal Range 1 Normal

566 rows × 21 columns

Normal level Alamine Aminotransferase (ALT) vs Alamine Aminotransferase (ALT) not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Alamine Aminotransferase (ALT) vs Alamine Aminotransferase (ALT) not in normal level as per the gender')

The condition of liver of the patient according to the Alamine Aminotransferase (ALT) rate¶

People who have Alamine Aminotransferase (ALT) in normal level and have liver disease¶

Gender
Female     54
Male      118
Name: Alamine_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alamine Aminotransferase (ALT) in recommended level and have liver disease as per the gender')

People who have Alamine Aminotransferase (ALT) in normal level and do not have liver disease¶

Gender
Female    39
Male      76
Name: Alamine_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Alamine Aminotransferase (ALT) in recommended level and do not have liver disease as per the gender')

Alamine Aminotransferase (ALT) in normal level and having liver disease VS Alamine Aminotransferase (ALT) in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Alamine Aminotransferase (ALT) in normal level and having liver disease VS Alamine Aminotransferase (ALT) in normal level and not having a liver disease')

People who do not have Alamine Aminotransferase (ALT) in normal level and have liver diesease¶

Gender
Female     36
Male      196
Name: Alamine_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alamine Aminotransferase (ALT) in normal level and have liver diesease as per the gender')

People who do not have Alamine Aminotransferase (ALT) in normal level and do not have liver disease as well¶

Gender
Female     9
Male      38
Name: Alamine_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Alamine Aminotransferase (ALT) in normal level and do not have liver disease as per the gender')

Alamine Aminotransferase (ALT) not in normal level and having liver disease VS Alamine Aminotransferase (ALT) not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Alamine Aminotransferase (ALT) not in normal level and having liver disease VS Alamine Aminotransferase (ALT) not in normal level and not having a liver disease')

Further analysis and Data Visualization on Sgot Aspartate Aminotransferase (AST)¶

Creating function as per the recommended rate of Aspartate Aminotransferase¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 21 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Age                                   566 non-null    int64  
 1   Gender                                566 non-null    object 
 2   Total_Bilirubin                       566 non-null    float64
 3   Direct_Bilirubin                      566 non-null    float64
 4   Alkaline_Phosphotase                  566 non-null    int64  
 5   Alamine_Aminotransferase              566 non-null    int64  
 6   Aspartate_Aminotransferase            566 non-null    int64  
 7   Total_Protiens                        566 non-null    float64
 8   Albumin                               566 non-null    float64
 9   Albumin_and_Globulin_Ratio            566 non-null    float64
 10  Dataset                               566 non-null    int64  
 11  Dataset_Details                       566 non-null    object 
 12  Gender_Binary                         566 non-null    int64  
 13  Total_Bilirubin_Binary                566 non-null    int64  
 14  Total_Bilirubin_Description           566 non-null    object 
 15  Direct_Bilirubin_Binary               566 non-null    int64  
 16  Direct_Bilirubin_Description          566 non-null    object 
 17  Alkaline_Phosphotase_Binary           566 non-null    int64  
 18  Alkaline_Phosphotase_Description      566 non-null    object 
 19  Alamine_Aminotransferase_Binary       566 non-null    int64  
 20  Alamine_Aminotransferase_Description  566 non-null    object 
dtypes: float64(5), int64(10), object(6)
memory usage: 93.0+ KB
0       18
1      100
2       68
3       20
4       59
      ... 
561     34
562     31
563     49
564     32
565     24
Name: Aspartate_Aminotransferase, Length: 566, dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 22 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Age                                   566 non-null    int64  
 1   Gender                                566 non-null    object 
 2   Total_Bilirubin                       566 non-null    float64
 3   Direct_Bilirubin                      566 non-null    float64
 4   Alkaline_Phosphotase                  566 non-null    int64  
 5   Alamine_Aminotransferase              566 non-null    int64  
 6   Aspartate_Aminotransferase            566 non-null    int64  
 7   Total_Protiens                        566 non-null    float64
 8   Albumin                               566 non-null    float64
 9   Albumin_and_Globulin_Ratio            566 non-null    float64
 10  Dataset                               566 non-null    int64  
 11  Dataset_Details                       566 non-null    object 
 12  Gender_Binary                         566 non-null    int64  
 13  Total_Bilirubin_Binary                566 non-null    int64  
 14  Total_Bilirubin_Description           566 non-null    object 
 15  Direct_Bilirubin_Binary               566 non-null    int64  
 16  Direct_Bilirubin_Description          566 non-null    object 
 17  Alkaline_Phosphotase_Binary           566 non-null    int64  
 18  Alkaline_Phosphotase_Description      566 non-null    object 
 19  Alamine_Aminotransferase_Binary       566 non-null    int64  
 20  Alamine_Aminotransferase_Description  566 non-null    object 
 21  Aspartate_Aminotransferase_Binary     566 non-null    int64  
dtypes: float64(5), int64(11), object(6)
memory usage: 97.4+ KB
Aspartate_Aminotransferase Aspartate_Aminotransferase_Binary
0 18 1
1 100 0
2 68 0
3 20 1
4 59 0
... ... ...
561 34 1
562 31 1
563 49 0
564 32 1
565 24 1

566 rows × 2 columns

Aspartate_Aminotransferase Aspartate_Aminotransferase_Binary Aspartate_Aminotransferase_Description
0 18 1 Normal
1 100 0 Not in Normal Range
2 68 0 Not in Normal Range
3 20 1 Normal
4 59 0 Not in Normal Range
... ... ... ...
561 34 1 Normal
562 31 1 Normal
563 49 0 Not in Normal Range
564 32 1 Normal
565 24 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 23 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
dtypes: float64(5), int64(11), object(7)
memory usage: 101.8+ KB

Peoples as per the age whose Aspartate Aminotransferase (AST) is not in the the recommended level¶

Age
4     1
7     2
8     1
10    1
12    2
     ..
73    2
74    1
75    4
78    1
90    1
Name: Aspartate_Aminotransferase_Binary, Length: 65, dtype: int64
<matplotlib.legend.Legend at 0x25bd31a0a00>

Top 10 age group people having Aspartate Aminotransferase (AST) in not recommended level¶

Age Count
49 60 21
22 32 16
35 45 15
38 48 12
45 55 11
30 40 11
28 38 11
40 50 10
32 42 9
36 46 9
<matplotlib.legend.Legend at 0x25bc88406d0>

Peoples as per the age having normal Aspartate Aminotransferase (AST) level¶

Age
4      1
6      1
11     1
13     3
14     1
      ..
72     5
74     3
75    10
84     1
85     1
Name: Aspartate_Aminotransferase_Binary, Length: 61, dtype: int64
<matplotlib.legend.Legend at 0x25bd0c27af0>

People having Aspartate Aminotransferase (AST) not in recommended level as per the gender¶

Gender
Female     49
Male      239
Name: Aspartate_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Aspartate Aminotransferase (AST) in recommended level as per the gender')

Peoples having normal Aspartate Aminotransferase (AST) level as per the gender¶

Gender
Female     89
Male      189
Name: Aspartate_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Aspartate Aminotransferase (AST) in recommended level as per the gender')

Normal level Aspartate Aminotransferase (AST) vs Aspartate Aminotransferase (AST) not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Aspartate Aminotransferase (AST) vs Aspartate Aminotransferase (AST) not in normal level as per the gender')

The condition of liver of the patient according to the Aspartate Aminotransferase (AST) rate¶

People who have Aspartate Aminotransferase (AST) in normal level and have liver disease¶

Gender
Female     52
Male      113
Name: Aspartate_Aminotransferase_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Aspartate Aminotransferase (AST) in recommended level and have liver disease as per the gender')

People who have Aspartate Aminotransferase (AST) in normal level and do not have liver disease¶

Gender
Female    37
Male      76
Name: Aspartate_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Aspartate Aminotransferase (AST) in recommended level and do not have liver disease as per the gender')

Aspartate Aminotransferase (AST) in normal level and having liver disease VS Aspartate Aminotransferase (AST) in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Aspartate Aminotransferase (AST) in normal level and having liver disease VS Aspartate Aminotransferase (AST) in normal level and not having a liver disease')

People who do not have Aspartate Aminotransferase (AST) in normal level and have liver diesease¶

Gender
Female     38
Male      201
Name: Aspartate_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Aspartate Aminotransferase (AST) in normal level and have liver disease as per the gender')

People who do not have Aspartate Aminotransferase (AST) in normal level and do not have liver disease as well¶

Gender
Female    11
Male      38
Name: Aspartate_Aminotransferase_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Aspartate Aminotransferase (AST) in normal level and do not have liver disease as per the gender')

Aspartate Aminotransferase (AST) not in normal level and having liver disease VS Aspartate Aminotransferase (AST) not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Aspartate Aminotransferase (AST) not in normal level and having liver disease VS Aspartate Aminotransferase (AST) not in normal level and not having a liver disease')

Further analysis and Data Visualization on Total Proteins¶

Creating function as per the recommended rate of Total Proteins¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 23 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
dtypes: float64(5), int64(11), object(7)
memory usage: 101.8+ KB
0      6.8
1      7.5
2      7.0
3      6.8
4      7.3
      ... 
561    5.9
562    6.0
563    6.4
564    6.8
565    7.3
Name: Total_Protiens, Length: 566, dtype: float64
Total_Protiens Total_Protiens_Binary
0 6.8 1
1 7.5 1
2 7.0 1
3 6.8 1
4 7.3 1
... ... ...
561 5.9 0
562 6.0 1
563 6.4 1
564 6.8 1
565 7.3 1

566 rows × 2 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 24 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
dtypes: float64(5), int64(12), object(7)
memory usage: 106.2+ KB
Total_Protiens Total_Protiens_Binary Total_Protiens_Description
0 6.8 1 Normal
1 7.5 1 Normal
2 7.0 1 Normal
3 6.8 1 Normal
4 7.3 1 Normal
... ... ... ...
561 5.9 0 Not in Normal Range
562 6.0 1 Normal
563 6.4 1 Normal
564 6.8 1 Normal
565 7.3 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 25 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
dtypes: float64(5), int64(12), object(8)
memory usage: 110.7+ KB

Peoples as per the age whose Total Protiens is not in the the recommended level¶

Age
6      1
13     1
14     1
16     1
18     3
19     1
20     1
21     1
22     3
24     1
25     1
26     6
27     2
28     3
29     3
30     5
31     1
32     8
33     3
34     4
35     2
36     5
37     7
38     3
39     2
40     6
42     9
44     1
45    12
46     4
47     1
48     9
49     4
50     5
51     2
52     1
53     2
54     1
55     5
56     2
57     5
58     5
60    11
61     2
62     4
63     1
64     1
65     9
66     4
68     2
69     1
70     4
72     1
73     1
74     2
75     9
Name: Total_Protiens_Binary, dtype: int64
<matplotlib.legend.Legend at 0x25bd6cefd00>

Top 10 age group people having Total Protiens in not recommended level¶

Age Count
28 45 12
42 60 11
26 42 9
47 65 9
31 48 9
55 75 9
17 32 8
22 37 7
11 26 6
25 40 6
<matplotlib.legend.Legend at 0x25bd70f87c0>

Peoples as per the age having normal Total Protiens level¶

Age
4     2
7     2
8     1
10    1
11    1
     ..
75    5
78    1
84    1
85    1
90    1
Name: Total_Protiens_Binary, Length: 71, dtype: int64
<matplotlib.legend.Legend at 0x25bd7146760>

People having Total Protiens not in recommended level as per the gender¶

Gender
Female     48
Male      147
Name: Total_Protiens_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Protiens in recommended level as per the gender')

Peoples having normal Total Protiens level as per the gender¶

Gender
Female     90
Male      281
Name: Total_Protiens_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Protiens in recommended level as per the gender')

Normal level Total Protiens vs Total Protiens not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Total Protiens vs Total Protiens not in normal level as per the gender')

The condition of liver of the patient according to the Total Protiens rate¶

People who have Total Protiens in normal level and have liver disease¶

Gender
Female     60
Male      202
Name: Total_Protiens_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Protiens in recommended level and have liver disease as per the gender')

People who have Total Protiens in normal level and do not have liver disease¶

Gender
Female    30
Male      79
Name: Total_Protiens_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Total Protiens in recommended level and do not have liver disease as per the gender')

Total Protiens in normal level and having liver disease VS Total Protiens in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Total Protiens in normal level and having liver disease VS  Total Protiens in normal level and not having a liver disease')

People who do not have Total Protiens in normal level and have liver diesease¶

Gender
Female     30
Male      112
Name: Total_Protiens_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Protiens in normal level and have liver diesease as per the gender')

People who do not have Total Protiens in normal level and do not have liver diesease as well¶

Gender
Female    18
Male      35
Name: Total_Protiens_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Total Protiens in normal level and do not have liver diesease as per the gender')

Total Protiens not in normal level and having liver disease VS Total Protiens not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Total Protiens not in normal level and having liver disease VS  Total Protiens not in normal level and not having a liver disease')

Further analysis and Data Visualization on Albumin¶

Creating function as per the recommended rate of Albumin¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 25 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
dtypes: float64(5), int64(12), object(8)
memory usage: 110.7+ KB
0      3.3
1      3.2
2      3.3
3      3.4
4      2.4
      ... 
561    1.6
562    3.2
563    3.2
564    3.4
565    4.4
Name: Albumin, Length: 566, dtype: float64
Albumin Albumin_Binary
0 3.3 0
1 3.2 0
2 3.3 0
3 3.4 0
4 2.4 0
... ... ...
561 1.6 0
562 3.2 0
563 3.2 0
564 3.4 0
565 4.4 1

566 rows × 2 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 26 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
dtypes: float64(5), int64(13), object(8)
memory usage: 115.1+ KB
Albumin Albumin_Binary Albumin_Description
0 3.3 0 Not in Normal Range
1 3.2 0 Not in Normal Range
2 3.3 0 Not in Normal Range
3 3.4 0 Not in Normal Range
4 2.4 0 Not in Normal Range
... ... ... ...
561 1.6 0 Not in Normal Range
562 3.2 0 Not in Normal Range
563 3.2 0 Not in Normal Range
564 3.4 0 Not in Normal Range
565 4.4 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 27 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
 26  Albumin_Description                     566 non-null    object 
dtypes: float64(5), int64(13), object(9)
memory usage: 119.5+ KB

Peoples as per the age whose Albumin is not in the the recommended level¶

Age
4      1
6      1
7      1
8      1
16     1
      ..
74     2
75    13
78     1
84     1
90     1
Name: Albumin_Binary, Length: 64, dtype: int64
<matplotlib.legend.Legend at 0x25bd51bdd00>

Top 10 age group people having Albumin in not recommended level¶

Age Count
46 60 28
19 32 17
42 55 17
29 42 16
32 45 15
51 65 14
60 75 13
35 48 13
33 46 13
37 50 12
<matplotlib.legend.Legend at 0x25bda1d6bb0>

Peoples as per the age having normal Albumin level¶

Age
4     1
7     1
10    1
11    1
12    2
     ..
70    1
72    1
74    2
75    1
85    1
Name: Albumin_Binary, Length: 65, dtype: int64
<matplotlib.legend.Legend at 0x25bd98abcd0>

People having Albumin not in recommended level as per the gender¶

Gender
Female     77
Male      283
Name: Albumin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin in recommended level as per the gender')

Peoples having normal Albumin level as per the gender¶

Gender
Female     61
Male      145
Name: Albumin_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin in recommended level as per the gender')

Normal level Albumin vs Albumin not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Albumin vs Albumin not in normal level as per the gender')

The condition of liver of the patient according to the Albumin rate¶

People who have Albumin in normal level and have liver disease¶

Gender
Female    38
Male      87
Name: Albumin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin in recommended level and have liver disease as per the gender')

People who have Albumin in normal level and do not have liver disease¶

Gender
Female    23
Male      58
Name: Albumin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin in recommended level and do not have liver disease as per the gender')

Albumin in normal level and having liver disease VS Albumin in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Albumin in normal level and having liver disease VS  Albumin in normal level and not having a liver disease')

People who do not have Albumin in normal level and have liver diesease¶

Gender
Female     52
Male      227
Name: Albumin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin in normal level and have liver diesease as per the gender')

People who do not have Albumin in normal level and do not have liver diesease as well¶

Gender
Female    25
Male      56
Name: Albumin_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin in normal level and do not have liver diesease as per the gender')

Albumin not in normal level and having liver disease VS Albumin not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Albumin not in normal level and having liver disease VS  Albumin not in normal level and not having a liver disease')

Creating function as per the recommended rate of Albumin and Globulin Ratio¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 27 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
 26  Albumin_Description                     566 non-null    object 
dtypes: float64(5), int64(13), object(9)
memory usage: 119.5+ KB
0      0.90
1      0.74
2      0.89
3      1.00
4      0.40
       ... 
561    0.37
562    1.10
563    1.00
564    1.00
565    1.50
Name: Albumin_and_Globulin_Ratio, Length: 566, dtype: float64
Albumin_and_Globulin_Ratio Albumin_and_Globulin_Ratio_Binary
0 0.90 0
1 0.74 0
2 0.89 0
3 1.00 0
4 0.40 0
... ... ...
561 0.37 0
562 1.10 0
563 1.00 0
564 1.00 0
565 1.50 1

566 rows × 2 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 28 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
 26  Albumin_Description                     566 non-null    object 
 27  Albumin_and_Globulin_Ratio_Binary       566 non-null    int64  
dtypes: float64(5), int64(14), object(9)
memory usage: 123.9+ KB
Albumin_and_Globulin_Ratio Albumin_and_Globulin_Ratio_Binary Albumin_and_Globulin_Ratio_Description
0 0.90 0 Not in Normal Range
1 0.74 0 Not in Normal Range
2 0.89 0 Not in Normal Range
3 1.00 0 Not in Normal Range
4 0.40 0 Not in Normal Range
... ... ... ...
561 0.37 0 Not in Normal Range
562 1.10 0 Not in Normal Range
563 1.00 0 Not in Normal Range
564 1.00 0 Not in Normal Range
565 1.50 1 Normal

566 rows × 3 columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 29 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
 26  Albumin_Description                     566 non-null    object 
 27  Albumin_and_Globulin_Ratio_Binary       566 non-null    int64  
 28  Albumin_and_Globulin_Ratio_Description  566 non-null    object 
dtypes: float64(5), int64(14), object(10)
memory usage: 128.4+ KB

Peoples as per the age whose Albumin and Globulin Ratio is not in the the recommended level¶

Age
4      2
6      1
7      2
8      1
10     1
      ..
75    14
78     1
84     1
85     1
90     1
Name: Albumin_and_Globulin_Ratio_Binary, Length: 71, dtype: int64
<matplotlib.legend.Legend at 0x25bdb927c40>

Top 10 age group people having Albumin and Globulin Ratio in not recommended level¶

Age Count
52 60 34
38 45 24
43 50 21
35 42 20
41 48 19
31 38 19
25 32 18
48 55 18
57 65 17
39 46 16
<matplotlib.legend.Legend at 0x25bd98adc40>

Peoples as per the age having normal Albumin and Globulin Ratio level¶

Age
15    1
17    2
24    1
25    2
27    1
28    2
29    1
31    1
32    2
33    1
35    1
37    1
38    1
40    1
43    1
48    1
49    2
50    2
53    1
54    1
62    2
63    1
66    1
68    1
70    1
Name: Albumin_and_Globulin_Ratio_Binary, dtype: int64
<matplotlib.legend.Legend at 0x25bdbe82d30>

People having Albumin and Globulin Ratio not in recommended level as per the gender¶

Gender
Female    130
Male      404
Name: Albumin_and_Globulin_Ratio_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin and Globulin Ratio in recommended level as per the gender')

Peoples having normal Albumin and Globulin Ratio level as per the gender¶

Gender
Female     8
Male      24
Name: Albumin_and_Globulin_Ratio_Binary, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin and Globulin Ratio in recommended level as per the gender')

Normal level Albumin and Globulin Ratio vs Albumin and Globulin Ratio not in normal level as per the gender¶

Text(0.5, 0.98, 'Bar graph of Normal level Albumin and Globulin Ratio vs Albumin and Globulin Ratio not in normal level as per the gender')

The condition of liver of the patient according to the Albumin and Globulin Ratio rate¶

People who have Albumin and Globulin Ratio rate in normal level and have liver disease¶

Gender
Female     5
Male      15
Name: Albumin_and_Globulin_Ratio_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin and Globulin Ratio in recommended level and have liver disease as per the gender')

People who have Albumin and Globulin Ratio in normal level and do not have liver disease¶

Gender
Female    3
Male      9
Name: Albumin_and_Globulin_Ratio_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who have Albumin and Globulin Ratio in recommended level and do not have liver disease as per the gender')

Albumin and Globulin Ratio in normal level and having liver disease VS Albumin and Globulin Ratio in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Normal level Albumin and Globulin Ratio in normal level and having liver disease VS  Albumin in normal level and not having a liver disease')

People who do not have Albumin and Globulin Ratio in normal level and have liver diesease¶

Gender
Female     85
Male      299
Name: Albumin_and_Globulin_Ratio_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin and Globulin Ratio in normal level and have liver diesease as per the gender')

People who do not have Albumin and Globulin Ratio in normal level and do not have liver diesease as well¶

Gender
Female     45
Male      105
Name: Albumin_and_Globulin_Ratio_Description, dtype: int64
Text(0.5, 1.0, 'Bar graph representing people who do not have Albumin and Globulin Ratio in normal level and do not have liver diesease as per the gender')

Albumin and Globulin Ratio not in normal level and having liver disease VS Albumin and Globulin Ratio not in normal level and not having a liver disease¶

Text(0.5, 0.98, 'Bar graph of Albumin and Globulin Ratio not in normal level and having liver disease VS  Albumin and Globulin Ratio not in normal level and not having a liver disease')

Creating the pariplot according to the liver and non liver patients¶

<seaborn.axisgrid.PairGrid at 0x25bdf4700d0>

Creating the pariplot according to the gender¶

<seaborn.axisgrid.PairGrid at 0x25bee3a5af0>

The pairplot comprises two figures namely, the histogram and the scatter plot. Histogram can be used to view the distribution of a single variable. Likewise, the scatter plots on the upper and lower triangles are used to view the relationship between two variables.

To learn more about pairplots, Click Here!!!

Determining Strong Correlation between the Attributes¶

Correlation between Total Bilirubin & Direct Bilirubin¶

Viewing Pearsons correlation between Total Bilirubin & Direct Bilirubin¶

Pearsons correlation between Total Bilirubin & Direct Bilirubin: 0.874 and p-value: 6.802449119535963e-179

There is a positive, strong correlation between the two features namely Total Bilirubin and Direct Bilirubin of the dataset.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 566 entries, 0 to 565
Data columns (total 29 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   Age                                     566 non-null    int64  
 1   Gender                                  566 non-null    object 
 2   Total_Bilirubin                         566 non-null    float64
 3   Direct_Bilirubin                        566 non-null    float64
 4   Alkaline_Phosphotase                    566 non-null    int64  
 5   Alamine_Aminotransferase                566 non-null    int64  
 6   Aspartate_Aminotransferase              566 non-null    int64  
 7   Total_Protiens                          566 non-null    float64
 8   Albumin                                 566 non-null    float64
 9   Albumin_and_Globulin_Ratio              566 non-null    float64
 10  Dataset                                 566 non-null    int64  
 11  Dataset_Details                         566 non-null    object 
 12  Gender_Binary                           566 non-null    int64  
 13  Total_Bilirubin_Binary                  566 non-null    int64  
 14  Total_Bilirubin_Description             566 non-null    object 
 15  Direct_Bilirubin_Binary                 566 non-null    int64  
 16  Direct_Bilirubin_Description            566 non-null    object 
 17  Alkaline_Phosphotase_Binary             566 non-null    int64  
 18  Alkaline_Phosphotase_Description        566 non-null    object 
 19  Alamine_Aminotransferase_Binary         566 non-null    int64  
 20  Alamine_Aminotransferase_Description    566 non-null    object 
 21  Aspartate_Aminotransferase_Binary       566 non-null    int64  
 22  Aspartate_Aminotransferase_Description  566 non-null    object 
 23  Total_Protiens_Binary                   566 non-null    int64  
 24  Total_Protiens_Description              566 non-null    object 
 25  Albumin_Binary                          566 non-null    int64  
 26  Albumin_Description                     566 non-null    object 
 27  Albumin_and_Globulin_Ratio_Binary       566 non-null    int64  
 28  Albumin_and_Globulin_Ratio_Description  566 non-null    object 
dtypes: float64(5), int64(14), object(10)
memory usage: 128.4+ KB

Coorelation between Alamine Aminotransferase & Aspartate Aminotransferase¶

Viewing Pearsons correlation between Alamine Aminotransferase & Aspartate Aminotransferase¶

Pearsons correlation between Alamine Aminotransferase & Aspartate Aminotransferase: 0.792 and p-value: 7.764611510959146e-123

There is a positive, strong correlation between the two features namely Alamine Aminotransferase and Aspartate Aminotransferase of the dataset.

Correlation between Total Protiens & Albumin¶

Viewing Pearsons correlation between Total Protiens & Albumin¶

Pearsons correlation between Total Protiens & Albumin  0.784 and p-value: 8.711597342630112e-119

There is a positive, strong correlation between the two features namely Total Protiens and Albumin of the dataset.

Correlation between Albumin and Globulin Ratio & Albumin¶

Viewing Pearsons correlation between Albumin and Globulin Ratio & Albumin¶

Pearsons correlation between Albumin and Globulin Ratio & Albumin 0.687 and p-value: 2.1793959764618003e-80

There is a positive, strong correlation between the two features namely Albumin and Albumin and Globulin Ratio of the dataset.

Coorelation between Total_Protiens and Albumin and Globulin Ratio (View)¶

Viewing Pearsons correlation between Total Protiens & Albumin and Globulin Ratio (View)¶

Pearsons correlation between Albumin and Globulin Ratio & Total_Protiens 0.235 and p-value: 1.6291299405240473e-08

There is a moderate correlation between the two features namely Total Protiens & Albumin and Globulin Ratio of the dataset.

Note:

  • +1.0 : Perfect positive + association
  • +0.8 to 1.0 : Very strong + association
  • +0.6 to 0.8 : Strong + association
  • +0.4 to 0.6 : Moderate + association
  • +0.2 to 0.4 : Weak + association
  • 0.0 to +0.2 : Very weak + or no association
  • 0.0 to -0.2 : Very weak - or no association
  • -0.2 to –0.4 : Weak - association
  • -0.4 to -0.6 : Moderate - association
  • -0.6 to -0.8 : Strong - association
  • -0.8 to -1.0 : Very strong - association
  • -1.0 : Perfect negative association

Verdict from the correlation plots¶

Conclusion

  • There is a positive, strong correlation between the two features namely Total Bilirubin and Direct Bilirubin of the dataset.
  • There is a positive, strong correlation between the two features namely Alamine Aminotransferase and Aspartate Aminotransferase of the dataset.
  • There is a positive, strong correlation between the two features namely Total Protiens and Albumin of the dataset.
  • There is a positive, strong correlation between the two features namely Albumin and Albumin and Globulin Ratio of the dataset.

3D scatter plot which represents the age of patients with Direct Bilirubin and Total Bilirubin as per their gender¶

GenderFemaleMaleAge v/s Direct_Bilirubin v/s Total_Bilirubin
plotly-logomark

3D scatter plot which represents the age of patients with Alkaline Phosphotase and Alamine Aminotransferase as per their gender¶

GenderFemaleMaleAge v/s Alkaline_Phosphotase v/s Alamine_Aminotransferase
plotly-logomark

3D scatter plot which represents the age of patients with Total Protiens and Albumin and Globulin Ratio as per their gender¶

GenderFemaleMaleAge v/s Total_Protiens v/s Albumin_and_Globulin_Ratio
plotly-logomark

3D scatter plot which represents the age of patients with Albumin and Aspartate Aminotransferase as per their gender¶

GenderFemaleMaleAge v/s Albumin v/s Aspartate_Aminotransferase
plotly-logomark

Dealing with the outliers¶

Viewing the dataframe¶

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 583 entries, 0 to 582
Data columns (total 11 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Age                         583 non-null    int64  
 1   Gender                      583 non-null    object 
 2   Total_Bilirubin             583 non-null    float64
 3   Direct_Bilirubin            583 non-null    float64
 4   Alkaline_Phosphotase        583 non-null    int64  
 5   Alamine_Aminotransferase    583 non-null    int64  
 6   Aspartate_Aminotransferase  583 non-null    int64  
 7   Total_Protiens              583 non-null    float64
 8   Albumin                     583 non-null    float64
 9   Albumin_and_Globulin_Ratio  579 non-null    float64
 10  Dataset                     583 non-null    int64  
dtypes: float64(5), int64(5), object(1)
memory usage: 50.2+ KB

Boxplot of Total Bilirubin and Direct Bilirubin¶

Text(0.5, 0.98, 'Boxplot of Total Bilirubin | Direct Bilirubin')

Boxplot of Total Protiens, Albumin, and Albumin and Globulin Ratio¶

Text(0.5, 0.98, 'Boxplot of Total Protiens | Albumin | Albumin and Globulin Ratio')

Boxplot of Aspartate Aminotransferase, Alamine Aminotransferase, and Alkaline Phosphotase¶

Text(0.5, 0.98, 'Boxplot of Aspartate Aminotransferase | Alamine Aminotransferase | Alkaline Phosphotase')

Machine learning models analysis¶

Here, the prediction is being carried out in order to determine if the patient has an unhealthy Liver or not. Hence, the Outcome will be the y label and rest of the data will be the X or the input data.

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
578 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2
579 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
580 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
581 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
582 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2

583 rows × 11 columns

Making copy of the DataFrame¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
578 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2
579 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
580 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
581 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
582 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2

583 rows × 11 columns

Converting dataset columns in binary number¶

Dataset Column

  • 1 - Patient with liver disease which is represented as 1 itself
  • 2 - Patient with no disease which is represented as 0.
Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset Dataset_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 1
... ... ... ... ... ... ... ... ... ... ... ... ...
578 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 2 0
579 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1 1
580 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1 1
581 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1 1
582 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 2 0

583 rows × 12 columns

Dropping dataset column¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1

Duplicating the rows in the DataFrame¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
1161 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 0
1162 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
1163 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
1164 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
1165 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 0

1166 rows × 11 columns

Dropping null values from the dataset¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1
... ... ... ... ... ... ... ... ... ... ... ...
1161 60 Male 0.5 0.1 500 20 34 5.9 1.6 0.37 0
1162 40 Male 0.6 0.1 98 35 31 6.0 3.2 1.10 1
1163 52 Male 0.8 0.2 245 48 49 6.4 3.2 1.00 1
1164 31 Male 1.3 0.5 184 29 32 6.8 3.4 1.00 1
1165 38 Male 1.0 0.3 216 21 24 7.3 4.4 1.50 0

1158 rows × 11 columns

Converting Gender to Binary i.e. F:0 and M:1 and creating a new column called Gender_Binary¶

Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset_Binary Gender_Binary
0 65 Female 0.7 0.1 187 16 18 6.8 3.3 0.90 1 0
1 62 Male 10.9 5.5 699 64 100 7.5 3.2 0.74 1 1
2 62 Male 7.3 4.1 490 60 68 7.0 3.3 0.89 1 1
3 58 Male 1.0 0.4 182 14 20 6.8 3.4 1.00 1 1
4 72 Male 3.9 2.0 195 27 59 7.3 2.4 0.40 1 1

Dropping column Gender¶

Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Dataset_Binary Gender_Binary
0 65 0.7 0.1 187 16 18 6.8 3.3 0.90 1 0
1 62 10.9 5.5 699 64 100 7.5 3.2 0.74 1 1
2 62 7.3 4.1 490 60 68 7.0 3.3 0.89 1 1
3 58 1.0 0.4 182 14 20 6.8 3.4 1.00 1 1
4 72 3.9 2.0 195 27 59 7.3 2.4 0.40 1 1

Data except the outcome (Dataset) column in x¶

Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Gender_Binary
0 65 0.7 0.1 187 16 18 6.8 3.3 0.90 0
1 62 10.9 5.5 699 64 100 7.5 3.2 0.74 1
2 62 7.3 4.1 490 60 68 7.0 3.3 0.89 1
3 58 1.0 0.4 182 14 20 6.8 3.4 1.00 1
4 72 3.9 2.0 195 27 59 7.3 2.4 0.40 1

Data of the outcome (Dataset) column in y¶

0    1
1    1
2    1
3    1
4    1
Name: Dataset_Binary, dtype: int64

Splitting data into training and testing state¶

Viewing the number of data present in training test and testing set¶

(926, 232)

StandardScalar¶

StandardScalar transforms the data in such a manner that it has mean as 0 and standard deviation as 1. In short, it standardizes the data.

Standardize features by removing the mean and scaling to unit variance.



For more information click here!!!

For detailed information clieck here!!!

Logistic Regression¶

LogisticRegression()
Accuracy obtained by Logistic Regression model: 77.58620689655173

Confusion Matrix of Logistic Regression¶

Text(0.5, 1.03, 'Confusion Matrix for Logistic Regression')

Classification report of Logistic Regression¶

              precision    recall  f1-score   support

           0       0.58      0.36      0.45        58
           1       0.81      0.91      0.86       174

    accuracy                           0.78       232
   macro avg       0.70      0.64      0.65       232
weighted avg       0.75      0.78      0.76       232

AUC-ROC Curve of Logistic Regression¶

Random Forest Classifier¶

RandomForestClassifier()
Accuracy obtained by Random Forest Classifier model: 93.96551724137932

Confusion Matrix of Random Forest Classifier¶

Classification report of Random Forest Classifier¶

              precision    recall  f1-score   support

           0       0.84      0.93      0.89        58
           1       0.98      0.94      0.96       174

    accuracy                           0.94       232
   macro avg       0.91      0.94      0.92       232
weighted avg       0.94      0.94      0.94       232

AUC-ROC Curve of Random Forest Classifier¶

K Neighbors Classifier¶

KNeighborsClassifier(n_neighbors=4)
Accuracy obtained by K Neighbors Classifier model: 70.6896551724138

Confusion matrix of K Neighbors Classifier¶

Text(0.5, 1.03, 'Confusion Matrix for K Neighbors Classifier')

Classification report of K Neighbors Classifier¶

              precision    recall  f1-score   support

           0       0.45      0.72      0.55        58
           1       0.88      0.70      0.78       174

    accuracy                           0.71       232
   macro avg       0.67      0.71      0.67       232
weighted avg       0.77      0.71      0.72       232

AUC-ROC Curve of K Neighbors Classifier¶

Decision Tree Classifier¶

DecisionTreeClassifier()
Accuracy obtained by Decision Tree Classifier model: 91.37931034482759

Confusion matrix of Decision Tree Classifier¶

Text(0.5, 1.03, 'Confusion Matrix for Decision Tree Classifier')

Classification Report of Decision Tree Classifier¶

              precision    recall  f1-score   support

           0       0.77      0.93      0.84        58
           1       0.98      0.91      0.94       174

    accuracy                           0.91       232
   macro avg       0.87      0.92      0.89       232
weighted avg       0.92      0.91      0.92       232

AUC-ROC Curve of Decision Tree Classifier¶

Cat Boost Classifier¶

Learning rate set to 0.5
0:	learn: 0.6142972	total: 157ms	remaining: 1.41s
1:	learn: 0.5602948	total: 161ms	remaining: 643ms
2:	learn: 0.5207494	total: 163ms	remaining: 380ms
3:	learn: 0.5010780	total: 165ms	remaining: 247ms
4:	learn: 0.4765250	total: 167ms	remaining: 167ms
5:	learn: 0.4622501	total: 169ms	remaining: 112ms
6:	learn: 0.4535824	total: 172ms	remaining: 73.6ms
7:	learn: 0.4415049	total: 173ms	remaining: 43.4ms
8:	learn: 0.4315656	total: 176ms	remaining: 19.5ms
9:	learn: 0.4249749	total: 177ms	remaining: 0us
<catboost.core.CatBoostClassifier at 0x25b88401a00>
Accuracy obtained by CatBoost Classifier model: 80.60344827586206

Confusion matrix of Cat Boost Classifier¶

Text(0.5, 1.03, 'Confusion Matrix for CatBoost Classifier')

Classification Report of Cat Boost Classifier¶

              precision    recall  f1-score   support

           0       0.64      0.50      0.56        58
           1       0.84      0.91      0.88       174

    accuracy                           0.81       232
   macro avg       0.74      0.70      0.72       232
weighted avg       0.79      0.81      0.80       232

AUC-ROC Curve of Cat Boost Classifier¶

Gradient Boosting Classifier¶

GradientBoostingClassifier()
Accuracy obtained by Gradient Boosting Classifier model: 88.79310344827587

Confusion matrix Gradient Boosting Classifier¶

Text(0.5, 1.03, 'Confusion Matrix for Gradient Boosting Classifier')

Classification Report of Gradient Boosting Classifier¶

              precision    recall  f1-score   support

           0       0.83      0.69      0.75        58
           1       0.90      0.95      0.93       174

    accuracy                           0.89       232
   macro avg       0.87      0.82      0.84       232
weighted avg       0.88      0.89      0.88       232

AUC-ROC Curve of Gradient Boosting Classifier¶

Support vector machine¶

SVC(probability=True)
Accuracy obtained by Support vector machine: 76.29310344827587

Confusion matrix Support vector machine¶

Text(0.5, 1.03, 'Confusion Matrix for Support vector machine')

Classification Report of Support vector machine¶

              precision    recall  f1-score   support

           0       1.00      0.05      0.10        58
           1       0.76      1.00      0.86       174

    accuracy                           0.76       232
   macro avg       0.88      0.53      0.48       232
weighted avg       0.82      0.76      0.67       232

AUC-ROC Curve of Support vector machine¶

Comparison between the models as per the accuracy¶

Text(0.5, 1.03, 'Model Comparison - Model Accuracy')

The above bar graph shows that Random Forest Classifier perform the best on the test set.